Outliers Detection for Regression using K-Means and Expected Maximization Methods in Time Series Data
نویسندگان
چکیده
The evolution of computing technology and the ever increasing size and variety of data sets have created a new range of problems and challenges for data analysts, as well as new opportunities for intelligent systems in data analysis. This study concentrates on performing experimental analysis to find regression base outlier and influential point using two standard algorithms for data clustering are expectation maximization (EM) and K-means. BSESensex time series data has been considered for clustering and outlier detection. The parameters considered in evaluating the results of findings are the number of iterations, the computation time and the memory space consumed at the point of convergence of both K-means and Expectation-Maximization algorithms respectively. Outlier and influential points were detected. The results obtained revealed that Expectation-Maximization algorithm’s quick and premature convergence cannot be said to have guaranteed optimality of results while K-means was found not to guarantee convergence.
منابع مشابه
Identification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملOutlier Detection and Influential Point Observation in Linear Regression Using Clustering Techniques in Financial Time Series Data
The modern computing technology makes data gathering and storage easier. This creates new range of problems and challenges for data analysis. Detection of outliers in time series data has gained much attention in recent years. We present a new approach based on clustering techniques for outlier. The Expectation Maximization clusters (EM-Cluster) a l g o r i t h m is used to find the “optimal” p...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملIntroduction Package CircOutlier For Detection of Outliers in Circular-Circular Regression
One of the most important problem in any statistical analysis is the existence of unexpected observations. Some observations are not a part of the study and are known as outliers. Studies have shown that the outliers affect to the performance of statistical standard methods in models and predictions. The point of this work is to provide a couple of statistical package in R software to identi...
متن کاملCombination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کامل